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Abstract — Some state-of-art multimedia source encoders pro- 
duce embedded source bit streams that upon the reliable recep- 
tion of only a fraction of the total bit stream, the decoder is able 
reconstruct the source up to a basic quality. Reliable reception 
of later source bits gradually improve the reconstruction quality. 
Examples include scalable extensions of H.264/AVC and progres- 
sive image coders such as JPEG2000. To provide an efficient 
protection for embedded source bit streams, a concatenated block 
coding scheme using a minimum mean distortion criterion was 
considered in the past. Although, the original design was shown 
to achieve better mean distortion characteristics than previous 
studies, the proposed coding structure was leading to dramatic 
quality fluctuations. In this paper, a modification of the original 
design is first presented and then the second order statistics of 
the distortion is taken into account in the optimization. More 
specifically, an extension scheme is proposed using a minimum 
distortion variance optimization criterion. This robust system 
design is tested for an image transmission scenario. Numerical 
results show that the proposed extension achieves significantly 
lower variance than the original design, while showing similar 
mean distortion performance using both convolutional codes and 
low density parity check codes. 



I. Introduction 

Multimedia transmission for heterogeneous receivers is a 
challenging problem due to the unpredictable nature of the 
communication channels. Recent advances in multimedia com- 
pression technology are to account for an adaptation for the 
time-varying and band limited nature of wireless channels. 
Progressive source coding is an attractive solution for the 
transmission problems posed by multimedia streaming over 
such channels. The bit stream is generally said to be embedded 
if the removal of the end parts of the source bit stream enables 
adaptations to end user preferences according to varying 
terminal and network conditions. For example, the scalable 
extension of H.264 AVC j 1] allows reconstruction of the video 
at various bit rates using partial bit streams (layers) at the 
expense of some loss of coding efficiency compared to the 
single layer counterpart j2|. Also, the bit streams produced by 
SPIHT Q, JPEG2000 or the MPEG-4 fine grain scalable 
(FGS) coding |5| standards are embedded and provide a bit- 
wise fine granularity in which the bit stream can be truncated 
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at any point for source decoding. However, embedded source 
coders provide progressiveness at the expense of possessing 
some features that make them vulnerable to channel bit errors. 
For example, it is common to these source coders that the 
usefulness of correctly received bits depends on the reliable 
reception of the previous bits. Therefore, an efficient unequal 
error protection (UEP) scheme is needed for the reliable 
transmission of such multimedia data. Conventionally, less 
redundancy is added for each layer with decreasing importance 
for decoding to allow a graceful degradation of the source at 
the receiver (6). 

Transmission of progressive sources over error prone wire- 
less channels is a well investigated topic. Studies include vari- 
ous cross-layer protection strategies for multimedia streaming 
over wireless lossy networks Q and adaptive selections of 
application layer forward error correction (FEC) coding and 
deployment for embedded bit streams (8), J9j- For the latter, 
joint source-channel coding (JSCC) is the most popular. JSCC 
is extensively used in the literature, in which an appropriate 
channel code is used to protect the bit stream to optimize 
some criterion such as minimization of mean distortion or 
maximization of average useful source rate iflOl . 

In a broadcast transmission scenario, each member of the 
network is expected to receive at least a decent average 
multimedia quality in order to meet the fair service guarantee. 
Excessive quality fluctuations among the users of the same 
network can be avoided by minimizing the variance of the 
distortion at the terminal of each user 1111 . The main contri- 
bution of this study is to consider an efficient coding scheme 
in a broadcast scenario and introduce major modifications to 
the original design of |[T2l for improved distortion variance 
characteristics. 

The concatenated block coded embedded bit streams are 
shown to give superior performance over conventional cod- 
ing paradigms while providing flexible and low complexity 
transmission features over multi-hop networks ifPH . There are 
two assumptions about the previous coding structure that will 
not fit in a broadcast transmission scenario. First of all, in 
the original coding scheme, some of the information block 
sizes (optimized for minimum distortion) might be very large. 
Typically, the optimal number of encoding stages (M*) are 
reported to be four or five for the bit budget constraints and 
raw channel bit error rates considered. This means that there 
are five or six reconstruction levels at the receiver. This may 
not be desirable, for example, from an image transmission 
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perspective, because the user will only be able to see at most 
six different quality versions of the transmitted image with 
possible quality variations in between. Furthermore, it often 
leads to user dissatisfaction. 

Alternatively, each information block in the transmission 
system can be chopped into smaller chunks to allow a larger 
number of reconstruction possibilities at the receiver. Due to 
the embedded nature of the bitstream, this can provide two 
advantages: (1) one can obtain better mean distortion charac- 
teristics and (2) having more reconstruction levels leads to user 
satisfaction and increases the overall service quality. In other 
words, the image quality is not expected to vary dramatically 
because of the availability of larger set of reconstruction levels 
at the receiver. However, having larger number of chunks in the 
system means more redundancy allocation for error detection. 
Given the available bit budget constraint, this will eventually 
leave less room for source and channel coding bits. Thus, 
the paper is intended to carry out the optimization needed 
to resolve this trade-off. 

Secondly, the original optimization criterion was to mini- 
mize the average distortion of the reconstructed source. Al- 
though this criterion could be sufficient in a point-to-point 
communication scenario, it is rarely found in a broadcast 
transmission scenario. In order to maintain a decent average 
source quality among the network users, the second order 
statistics of the source distortion has to be taken into account. 
A way to approach this problem is presented in this paper; we 
consider the minimum distortion variance problem subject to a 
predetermined average source quality. This way, a reasonable 
mean source distortion can be obtained while guaranteeing the 
minimum deviation from the mean performance. 

The remainder of this paper is organized as follows: In 
Section II, the background information about concatenated 
block codes for embedded source bit streams is explained in 
detail. In Section III, the proposed extension framework is 
presented and associated optimization criteria as well as the 
optimization problems are introduced. Some of the numerical 
results are given in Section IV. Finally, a brief summary and 
conclusions follow in Section V. 



II. Concatenated block coding for embedded bit 

STREAM TRANSMISSION 

Concatenated block codes are considered in lfl2l for em- 
bedded bit stream transmission over error-prone memoryless 
channels. The proposed M-codeword scheme is shown in 
Fig. Q] and can use any discrete code set C. We give a brief 
description of the original coding structure before giving the 
details of the extension scheme. 

We first describe the coding structure for convolutional 
codes. The first stage of the encoder is the concatenation 
of bi source bits (i.e., source block Zi) with two bytes of 
CRQjWc = 16 bits) based on b\ bits for error detection. If 
the convolutional codes are selected, they can still be treated 
as block codes by appending m zero tailing bits to end the 
trellis at the all zero state. Therefore, \V\ \ = b±+ N c + m bits 
constitute the first pay load V\. Later, V\ is encoded using 
the channel code rate T\ G C to produce the codeword c±. 
This ends the first stage of encoding. In the next stage, c\ is 
concatenated with the second information block I2 (of size 
1^2 1 = ^2), N c and m bits to produce the second payload V2 



of size 1 7-2 1 = 
encoding stage 



In the next 



(6i+iV c + m)/ri + 

N c CRC bits are derived based only on those 
&2 bits. After the interleaving, the bits in tt(V2) are encoded 
using code rate r2 £ C to produce codeword c^ where tt(x) 
denotes the random block interleaving function that chooses a 
permutation table randomly according to a uniform distribu- 
tion, and permutes the entries of x bitwise based on this tabl^l 
This recursive encoding process continues until we encode the 
last codeword cm- Lastly, the codeword cm is transmitted over 
the binary symmetric channel (BSC) channel. Since the errors 
out of a maximum likelihood sequence estimator (MLSE) are 
generally bursty, and some of the block codes show poor 

'Here, a CRC polynomial is judiciously chosen to minimize undetected- 
error probability and the same CRC polynomial is used for all information 
blocks. The selected CRC polynomial is X 16 + X 12 + X 5 + X. Note that 
depending on the channel code used, for example low density parity check 
codes, CRC bits may not even be needed. 

2 We choose the size of the random permutation table to be equal to the 
length of the payload size in each encoding stage except the first. 
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Fig. 2: Encoding scheme using concatenated block coding and chopped information blocks. Pi: parity bits for codeword I. 



performance when channel errors are not independent and 
identically distributed (i.i.d.) iTOl . random block interleavers 
are used to break up the long burst noise sequences. 

The decoder performs the sequential encoding operations of 
the encoder in reverse order on the noisy version (cm) of the 
codeword cm- In other words, the noisy codeword cm is de- 
coded first using the corresponding channel decoder and then, 
the deinterleaver is invoked to obtain tv~ 1 (-k(T'm)) = Pm- 
Based on the decision of the error detection mechanism (e.g. 
CRC code), the Mth information block (Tm) is labeled as 
useful or not for the reconstruction process. Thus, Tm is asso- 
ciated with a label and peeled off from Pm- In the subsequent 
decoding stage, cm-i is decoded and deinterleaved in the 
same manner to obtain Vm-i and Im-x is determined to 
be useful or not in the reconstruction. The decoding operation 
is finalized as soon as the decoding of codeword c\ and the 
label assignment of X\ are performed. Assuming that the first 
label with a check failure is associated with Xi, then only the 
information blocks up to but not including the block I are used 
to reconstruct the source. 

If we use low density parity check (LDPC) block codes, we 
do not need to use iV c CRC and m tailing bits as the parity 
check matrix of the code provides an inherent error detection 
capability. However, similar to (§], an extra byte might be 
added for each chunk to inform the RC-LDPC decoder about 
the channel coding rate used for the next chunk. This can be 
thought of protocol based redundancy allowed in the system 
and constrains the available bit budget in transmission. 

III. Extension System and Optimization 

A. Extension System 

In the original concatenated block coding scheme shown in 
Fig. Q] there are M encoding stages that produce a sequence 
of embedded codewords. The number of reconstruction levels 
at the receiver is M + 1. As mentioned previously in the 
paper, small M* leads to large variations in the quality 
of the reconstructed source. In the extension system, each 
information block 2/ plus the corresponding N r redundant bits 
(for example using convolutional codes we have N r = N c ) 
are chopped into smaller chunks of equal size (v bits each) 
in order to increase the number of reconstruction levels at the 
receiver. This is illustrated in Fig. |2] Each block of (bi + N r )- 
bits (we refer to this entity "packet" later in the paper) is 
constrained to be an integer multiple of v bits and the size of 
each information chunk is k = v — N r bits. Let mi denote the 
number of separate chunks in the Zth encoding stage that makes 
up the block plus N c CRC bits in the original encoding 
scheme for convolutional codes. Therefore, in the extension 
scheme, we have X^=i m i + 1 = 12i=i l b '\ N ' J + 1 number of 



reconstruction levels. However, the proposed extension comes 
with the cost that increasing the number of chunks increases 
the amount of redundancy in the system. In the original design, 
total number of source bits are X)i=i-^i- ^ n me proposed 
extension however, since Y^i=x( m l ~ l)-^r extra redundant 
bits are used, the number of source bits are given by 

M M 

- (mi - l)N r < J^Zi- (!) 
;=i i=i 

B. Minimum mean distortion and minimum distortion vari- 
ance rate allocations 

In the original study, minimum mean distortion design 
criterion is assumed ITTSI . Alternatively, we can minimize 
the distortion variance subject to a constraint on the mean 
distortion performance. In other words, the distortion variance 
can be minimized such that the average distortion of the 
system is lower than or equal to some predetermined mean 
distortion value -jd. 

Let us assume that we are able to collect r tr channel bits per 
source sample (e.g. pixels). We denote the available code rate 
set by C = {ri, r2, ■ ■ ■ , rj}. Let us have M encoding stages 
having mi, 1 < i < M chunks in the i-th coding stage. For i > 
M, we define m t = 1 for completeness. We use concatenated 
block coding mechanism to encode the information chunks 
to produce the coded bit stream. A code allocation policy it 
allocates the channel code ci*' G C to be used in the i-th stage 
of the algorithm. Note that the number of packets in each 
information block depends on the ir and therefore denoted as 
TOi(7r) hereafter. The size of the outermost codeword length 
is given by 




where N s is the number of source samples. 

Assumption 1: For a tractable analysis, we assume perfect 
error detection. 

For a given channel, let the probability of decoding failure 
(for example, CRC code flags a failure) for the chunk i 
of the information block z (where ^i" 1 ]^) < * — 
J2j=i m j( n ))> which is protected by the sequence of channel 
codes ci z \ci z+1 \...,ci M) e C, be P e (d zM) ) for z < M. 
For z > M, we define P e {c^ ' M ^) = 1. Let the operational 
rate-distortion function of the source encoder be D(R) where 
R is the source rate in bits per source sample. 
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Assumption 2: For the algorithm design purposes, we use 
a similar approximation in 1 12] that decoder failure rate is 
independent for each coded information chunk. This approx- 
imation is shown to be good when convolutional and LDPC 
codes are used with long enough interleavers lfl2l . In general, 
our code set C can be chosen from any code family with 
a bit processing method (such as interleaving) as long as 
this assumption closely approximates the code block error 
performance. 

Lemma 1: Using Assumption 2, n-th moment of the dis- 
tortion at the receiver using the policy n, D^(n) is given by 
Equation 0. 

Proof: Let X be a random variable that takes on the 
distortion level d with probability pd = Pr(X = d). Consider 
the probability of truncating the chunk stream after reliably 
receiving the ith chunk of the jth information block. This cor- 
responds to the source decoder that reconstructs the source up 
to a distortion level djj = D f(X)t=i m t( 7r ) + i)wj> while 

the number of correctly decoded chunks is 2~2t=i m t{' K ) + *■ 
Therefore, 



= P e (ci j ' M) ) (l - Pei^y J! (l - Pe(c^>) 



Thus using Assumption 2, the 7i-th moment of distortion is 
simply given as follows, 



D n (n)=E[X n ] = Y^d] ti 



X Pdj,i 



Finally, note that j = 1, . . . , M and i = 0, . . . , m.,_i(7r) 
covers all the possibilities except the event that we receive all 
the chunks correct. This is fixed by letting j = M + 1 and 
TOm+i(tt) = 1. ■ 

C. Optimization Problems 

Next, we present the optimization problems considered in 
this study. We start with the original optimization problem 
i.e., Minimization of Mean Distortion, then we give the Con- 
strained Minimization of Distortion Variance problem for the 
proposed extension. Finally, we consider Minimum Second 
Moment of Distortion as an alternative solution for the latter. 

Problem 1: (Minimization of Mean Distortion) 



M 



min-D 7r (l) such that r tr 



1 



mi(iT)v 

1 Llj=i ,7r 



<B (4) 



where £ = • • • , &m} an d B is some threshold trans- 
mission rate in bits per source sample. As mentioned in the 
introduction section, we are interested in the minimization of 



the distortion variance subject to an average source quality 
constraint. This problem can be formulated as follows 

Problem 2: ( Constrained Minimization of Distortion Vari- 
ance) 



M 



mi(-K)v 
N s ^ TT M r u) 



min er^ such that r tr = -j^r 
where 



< B,D„(1) < lD 



M+l m 3 (7r)-l 
3 = 1 »=0 

(5) 

and 7£> is some mean distortion constraint on the average 
performance of the extension system. 

Problem 2 is relatively a harder problem than Problem 
1 because now the each term of the sum in Equation (0 
depends on the average distortion, which in turn depends on 
the parameters of the system subject to optimization. This 
problem can be simplified by the following observation. 

Note that we have D v (2) > D n (l) because by definition, 
the variance cannot be negative. This means that the maximum 
value of D n (l) is upper bounded and when the equality holds 
(D n (2) = D n (l)), the variance is minimized. On the contrary, 
if we allow lower 1)^(1) in order to obtain a better mean 
distortion, we will get a positive variance > 0). Thus, it is 
reasonable to assume that o-\ is a non-increasing function of 
£>7r(l) using the policy tt. In light of this assumption, we will 
set 0^(1) = 7£> to end up with an easier problem to solve: 

Problem 3: (Minimization of the Second Moment of Distor- 
tion ) 



M 



1 mi(7r)fc 
minZ? 7r (2) subject to rt r = — > — — — — < B 



(6) 



This problem gives the optimal solution of Problem 2 given 
that it achieves the minimum when the mean distortion hits 
the boundary of the constraint set. We solve aforementioned 
optimization problems using numerical optimization tools. We 
employ a constrained exhaustive search to find the optimal 
code allocation policy of the system. 

IV. Numerical Results 

We consider both the original as well as the extension 
schemes with two different optimization criteria. In general, 
we have four different possible combinations: 

• ConMinAve: Concatenated coding with minimum aver- 
age distortion optimization criterion. Let the minimum 
distortion be denoted as d* at the optimum. 

• ConMinVar: Concatenated coding with minimum distor- 
tion variance optimization criterion. 
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• ConChopMinAve: Extension scheme with minimum av- 
erage distortion optimization criterion. 

• ConChopMinVar. Extension scheme with minimum dis- 
tortion variance optimization criterion subject to a mini- 
mum distortion constraint -yn < d* . 

We do not consider the system ConMinVar, simply be- 
cause we intend to show how the "chopping" method can 
be instrumental to improve the performance of the original 
concatenated coding design. In addition, an increase in the 
distortion variance performance is expected, as we allow worse 
mean average distortion performance in the system. 

Also, since we constrain the information packet size to be 
equal to multiples of v and that we have discrete number of 
code rates in the code set C, it is not always possible to meet 
the average distortion constraint with equality i.e., jjj = d*. 
Thus, in solving Problem 3, we allow a margin of £ in order 
to find the best approximate solution. In other words, in our 
simulation results we have \ju — d*\ < (. 

We use 512x512 monochromatic images Lena and Goldhill 
with SPIHT and JPEG2000 progressive image coders. In the 
first simulation, we set v — 850 bits, M = 2 and use 
rate compatible punctured convolutional (RCPC) codes with 
memory 6 lfl4"l . We simulate all three systems and report 
average distortion and distortion variance performances as 
functions of the transmission rate in bits per pixel (bpp) 
when eo = 0.05. In all the simulation results using RCPC 
codes, ( w and j£> < d* . As can be seen, chopping the 
information blocks into smaller size chunks helps decrease 
the mean distortion and distortion variance in almost all 
the transmission rates of interest. In addition, allowing some 
performance degradation in mean distortion, we can obtain 
much better distortion variance characteristics. 

In the second simulation, we set r tr = 0.5bpp and M = 2 
and vary v to see the effect of variable chunk size on the 
overall performance. First of all, smaller chunk size does 
not necessarily mean better performance as the number of 
redundant CRC bits increase and consume the available bit 



budget. Consider the system ConChopMinVar. We have seen 
in the previous simulation that chopping helps to improve 
the mean system performance. Thus, for a given M, we can 
find an optimum chunk size that will minimize the distortion 
variance given that it satisfies a mean distortion constraint. 
In Fig. |4] we note that as we move from left to right on 
the abscissa, the number of reconstruction levels increase 
i.e., the block size decreases, number of blocks increases 
and number of redundancy used for error detection in the 
bit budget increases. Also, we observe that as we sacrifice 
some mean distortion performance, we obtain a decrease in 
distortion variance. This numerical example shows the validity 
of Assumption 2 about the relationship between the mean 
distortion and the distortion variance. They are observed to 
be inversely related. 

In Fig.|H we observe that the minimum variance is achieved 
when the block size hits 340 bits while satisfying the desired 
mean distortion constraint d* = 41.79 with equality. At the 
optimum, ConMinAve has only 3 reconstruction levels (since 
M = 2) at the receiver while ConChopMinVar has 126 differ- 
ent reconstruction levels. ConMinAve has a variance of 22.65 
and shown as a horizontal line for comparison. The variance of 
ConChopMinVar shows a jump after achieving the optimum 
at a variance of 9.53 (almost %58 percent decrease from that 
of ConMinAve). This is because as we have more chunks 
and therefore more reconstruction levels, CRC bits become 
dominant in the system. In order to satisfy the mean distortion 
constraint, the optimization mechanism changes the optimum 
channel code rates from (4/5,4/9) to (8/9,4/11). Having 
more powerful protection now decreases the mean distortion 
value while causing an increase in the total variance. Thus, 
ConChopMinVar has %52 less distortion variance compared 
to ConMinAve while both systems have almost the same mean 
distortion characteristics. Table Upresents a set of performance 
results using different images, transmission rates at various raw 
channel BERs. As can be observed, dramatic improvements on 
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Table I: Simulation results using RCPC codes. Average performances are mean square error values in numerics. 
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Table II: Simulation results using rate compatible LDPC codes and JPEG2000 source coder. Average performances are mean square error values in numerics. 
PSNR values in dB are included in parentheses next to MSE results. 



the variance characteristics of the original design are possible 
using the extension system. 

Finally in Table H2 we provide some of the simulation 
results using rate compatible LDPC codes lfT31 . We observe 
that 7£> « d* (i.e., max£ = 0.7) can be achieved using 
LDPC codes. However, we can obtain dramatic improvements 
in variance performance at the expense of little loss in expected 
distortion performance of the original design. Table U presents 
a set of performance results using different images, trans- 
mission rates at various raw channel BERs considered in |8| 
and fl2l . As can be observed, similar performance gains are 
possible. For example at a transmission rate r tr = 0.505bpp 
and eo = 0.05, the ConMinAve chooses (4/5,2/3) as the two 
optimal code rates with three levels of reconstruction since 
M = 2. In the extension scheme ConChopMinVar, choosing 
v = 2000bits and using the same optimal code rate pair, 
we obtained 44 different levels of reconstruction. The latter 
design gives almost the same image quality (~ 35.3dB) with 
a dramatic improvement in the variance, i.e., around 96.8% 
decrease in variance compared to the that of ConMinAve. 

V. CONCLUSIONS 

We have considered minimum variance concatenated block 
encoding scheme for progressive source transmissions. A non- 
trivial extension of the original design is introduced with better 
reconstruction properties at the receiver and more importantly 
better distortion variance characteristics at a given average 
reconstruction quality. We have considered three different 
optimization problems and simplified the variance distortion 
minimization problem. Simulation results show that dramatic 
improvements can be obtained with the extension system 
compared to the original coding scheme. 
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